Reducing the Cost of Dialogue System Training and Evaluation with Online, Crowd-Sourced Dialogue Data Collection
نویسندگان
چکیده
This paper presents and analyzes an approach to crowd-sourced spoken dialogue data collection. Our approach enables low cost collection of browser-based spoken dialogue interactions between two remote human participants (human-human condition) as well as one remote human participant and an automated dialogue system (human-agent condition). We present a case study in which 200 remote participants were recruited to participate in a fast-paced image matching game, and which included both human-human and human-agent conditions. We discuss several technical challenges encountered in achieving this crowd-sourced data collection, and analyze the costs in time and money of carrying out the study. Our results suggest the potential of crowdsourced spoken dialogue data to lower costs and facilitate a range of research in dialogue modeling, dialogue system design, and system evaluation.
منابع مشابه
Pair Me Up: A Web Framework for Crowd-Sourced Spoken Dialogue Collection
We describe and analyze a new web-based spoken dialogue data collection framework. The framework enables the capture of conversational speech from two remote users who converse with each other and play a dialogue game entirely through their web browsers. We report on the substantial improvements in the speed and cost of data capture we have observed with this crowd-sourced paradigm. We also ana...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملDoes History Help? An Experiment on How Context Affects Crowdsourcing Dialogue Annotation
Crowds of people can potentially solve some problems faster than individuals. Crowd sourced data can be leveraged to benefit the crowd by providing information or solutions faster than traditional means. Many tasks needed for developing dialogue systems such as annotation can benefit from crowdsourcing as well. We investigate how to outsource dialogue data annotation through Amazon Mechanical T...
متن کاملData Collection and Performance Evaluation of Spoken Dialogue Systems: the Mit Experience1
In this paper we report our efforts in data collection and performance evaluation in support of spoken dialogue system development. We describe two understanding metrics called query density and concept efficiency which can be interpreted on a perutterance basis, but which are measured over the course of a dialogue. We also describe the evaluation infrastructure we have developed to support off...
متن کامل